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EXECUTIVE  SUMMARY 


BACKGROUND 

Software  development  estimating  methods,  such  as  the  Constructive  COst 
Model  (COCOMO)  [1],  provide  a  point  estimate  of  the  required  effort  in 
staff  months.  These  estimating  methods  assume  accurate  a  priori  informa¬ 
tion  for: 


a.  the  size  of  the  project  in  thousands  of  lines  of  code,  and 

b.  the  complexity  of  the  project,  the  team's  skill  and  familiarity 
with  the  language,  the  available  equipment  and  tools,  etc.  These 
attributes  are  rated,  and  the  estimated  effort  is  adjusted  accord¬ 
ingly  by  these  Development  Effort  Multipliers  (DEMs). 

In  a  planning  phase,  there  is  significant  uncertainty  in  these  param¬ 
eters.  Consequently,  a  more  appropriate  and  useful  estimate  is  a  proba¬ 
bility  cumulative  distribution  function  (CDF)  of  effort. 


PURPOSE 

The  purpose  of  this  study  is  to  compute  the  effort  CDF  and  point  esti¬ 
mate,  given  assumed  probability  density  functions  (pdfs)  for  the  project's 
size  and  DEMs.  In  a  planning  phase,  these  estimates  are  frequently  based 
on  analogies  to  similar  functions  from  other  programs.  Although  there  is 
considerable  variability  in  the  historical  data,  it  is  possible  to  specify 
realistic  ranges  for  size  and  DEMs.  Combining  size  and  DEM  uncertainty 
into  a  total  system  effort  model  is  a  difficult  analytical  problem  and 
cannot  be  solved  exactly,  since  the  effort-estimating  model  is  a  nonlinear 
function  of  size,  while  the  DEMs  are  multiplicative  factors  evaluated  for 
the  individual  subsystems. 


RESULTS 

A  procedure  has  been  developed  for  estimating  the  system-level  effort 
point  estimate  and  CDF  in  terms  of  the  means  and  variances  of  size  and  of 
the  DEMs  of  the  subsystems.  Further,  it  is  shown  that  the  CDF  of  effort  is 
determined  to  an  accuracy  which  is  adequate  for  software  effort  estimation 
whenever  the  largest  contributing  variance  is  not  more  than  half  the  total 
variance  of  effort.  The  procedure  is  based  on  multivariable  Taylor  series 
and  on  rhe  properties  of  the  Central  Limit  Theorem. 


iii 


CONCLUSIONS 


The  proposed  solution  is  straightforward  to  implement,  though  there 
are  several  computational  steps;  a  spreadsheet  adequately  supports  the 
calculations.  The  method  is  recommended  for  use  in  planning  estimates  of 
software  systems  consisting  of  multiple  functional  areas,  each  with 
variability  in  the  estimating  parameters. 
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SECTION  1 


INTRODUCTION 


1 . 1  BACKGROUND 

Software  development  cost  estimation  is  subject  to  a  variety  of  uncer¬ 
tainties  of  which  some  are  common  to  all  systems,  while  others  are  unique 
to  software.  The  requirements  for  any  type  of  system  may  change  in  nature, 
scope,  or  functionality,  or  may  vanish.  Further,  especially  at  the  start 
of  a  software  development  process,  there  is  considerable  uncertainty  as  to 
the  final  investment  of  funds,  effort,  and  schedule.  One  of  the  principal 
sources  of  this  initial  uncertainty  is  the  estimated  size  of  the  software 
to  be  developed;  the  final  size  of  a  software  systu..  is  sometimes  more  than 
twice  the  original  estimate.  Another  major  source  of  uncertainty  lies  in 
the  estimating  model  of  software  development  cost,  for  no  highly  accurate 
model  exists  at  present.  And  it  is  common  to  find  that  the  complexity,  or 
other  similar  descriptors  of  a  project's  character,  cannot  be  accurately 
predetermined.  In  spite  of  these  important  uncertainties,  the  project 
manager  needs  to  know  what  budget  has  a  desired  probability  of  success,  or 
what  the  probability  of  .  uccess  is  for  a  given  budget.  These  are  provided 
by  the  effort  Cumulative  Distribution  Function  (CDF). 

A  popular  method  for  estimating  software  development  cost  is  the 
Constructive  COst  Model,  COCOMO  [1).  However,  this  is  in  principle  only  a 
method  for  forming  point  estimates  for  effort,  staffing,  and  schedule,  and 
has  no  inherent  capability  for  estimating  the  effects  of  uncertainty  of  the 
size  of  the  project  and  its  various  components,  known  as  computer  software 
configuration  items  (CSCIs).  Further,  this  model  requires  selection  of 
development  effort  multipliers  (DEMs)  to  represent  the  impact  of  particular 
aspects  of  the  project  or  environment,  such  as  the  complexity  of  the  proj¬ 
ect,  or  the  computation  facility  in  which  the  development  is  conducted. 
Initially,  these  attributes  also  are  inevitably  uncertain  due  to  incomplete 
knowledge  of  the  product,  the  process,  the  facility,  and  the  experience  and 
ability  of  the  staff.  And,  finally,  COCOMO  has  a  nontrivial  error  with 
respect  to  its  own  database.  In  consequence,  there  can  be  a  significant 
range  in  the  estimated  cost  of  the  system. 


1 . 2  SCOPE 

This  report  presents  a  method  for  extending  COCOMO  to  incorporate  the 
uncertainties  of  the  size,  the  rating  values  selected  for  the  DEMs,  and  the 
error  of  the  model  in  determining  the  probability  distribution  of  effort 
for  a  multiple  CSCI  project. 
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The  reader  is  asssumed  to  be  familiar  with  calculus  and  probability 
theory.  It  is  not  within  the  scope  of  this  paper  to  comment  on  ESD 
acquisition  and  software  cost  estimation  practices.  The  use  of  COCOMO  is 
primarily  intended  to  illustrate  application  of  the  method  which  is 
applicable  to  any  software  model  of  the  alb  form. 


1.3  APPROACH 

The  analytical  approach  is  outlined:  a)  the  software  development 
effort  is  expressed  as  a  multivariable  Taylor  series  in  terms  of  the  sizes 
and  the  DEMs  of  the  CSCIs;  b)  the  assumption  that  the  uncertainties  of 
these  variables  are  statistically  independent  is  used  to  approximate  the 
mean  effort  and  its  standard  deviation;  c)  the  Central  Limit  Theorem  (CLT) 
is  then  used  to  assert  that  the  effort  probability  density  function  (pdf) 
and  CDF  are  approximately  normal. 


1.4  DISCUSSION 

The  software  cost  analyst,  when  undertaking  an  uncertainty  analysis, 
typically  assigns  a  range  of  uncertainty  to  the  estimated  size  of  each 
major  component.  It  is,  however,  very  difficult  for  the  analyst  to  state 
with  any  confidence  whether  the  pdf  is  uniform,  triangular,  discrete,  beta, 
or  some  other  form.  A  similar  observation  applies  to  the  DEMs.  It  is  the 
particular  virtue  of  the  CLT  that  it  shows  the  forms  of  the  pdfs  of  the 
contributing  elements  to  be  unimportant:  the  forms  of  the  pdf  and  CDF  for 
the  effort  of  the  entire  project  converge  to  the  familiar  normal,  or 
Gaussian,  pdf  and  CDF  as  the  number  of  CSCIs  and  DEMs  increases.  It  is 
therefore  legitimate  for  the  analyst  to  assume  very  simple  forms  for  the 
pdfs  of  size  or  DEMs.  The  CLT  is  thus  especially  appropriate  and  useful 
for  software  development  effort  uncertainty  analysis. 


1.5  OUTLINE  OF  REPORT 

Section  2  presents  the  general  approach  to  software  development  effort 
uncertainty  analysis.  Subsections  2.1  and  2.2  set  forth,  respectively,  the 
foundations  and  the  theoretical  development  of  this  approach.  Various 
probability  density  functions  considered  suited  to  software  DEMs  and 
size  estimation  will  be  examined;  their  properties  will  be  developed  in 
section  3.  The  validity  of  the  CLT  in  this  application  is  considered  in 
section  4.  The  analyst  who  is  principally  interested  in  the  application 
and  use  of  this  technique,  rather  than  its  development,  may  go  directly  to 
section  5,  where  an  example  is  fully  developed.  The  various  worksheet 
formats  used  in  section  5  are  gathered  in  the  appendix. 
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SECTION  2 


THE  CENTRAL  LIMIT  THEOREM  AND  COCOMO 


2.1  MATHEMATICAL  FOUNDATIONS 

In  this  section,  analytical  relationships  will  be  developed  from  which 
the  effort  mean  and  standard  deviation  will  be  defined  in  terms  of  the 
means  and  standard  deviations  of  size  of  the  CSCIs  and  of  the  values  of  the 
DEMs.  By  invoking  the  CLT,  the  CDF  for  effort  can  be  formed,  which  enables 
calculation  of  the  probability  that  a  specified  value  of  effort  will  be 
exceeded,  and  the  confidence  level  of  the  estimate  can  be  determined.  The 
approach  is  developed  for  Intermediate  COCOMO.  It  can  be  extended  to 
Detailed  COCOMO,  and  to  any  model  of  the  form  al  . 

To  use  the  COCOMO  software  development  cost  estimating  model  requires 
the  analyst  to  know  the  estimated  size  of  each  CSCI .  In  addition,  the 
analyst  requires  the  appropriate  value  to  use  for  each  of  the  15  or  more 
DEMs.  These  cost-driver  attributes  are  not  easily  evaluated:  for  example, 
ACAP  (analyst  capability)  is  very  low  if  "...the  average  analyst  lies  at 
the  15th  percentile  in  terms  of  ability,  efficiency,  ability  to  communicate 
and  cooperate."  This  is  a  highly  subjective  criterion.  The  other  DEMs 
have  similar  definitions,  with  similar  subjective  evaluations.  To  specify 
a  priori  the  true  value  for  each  of  the  DEMs  is  not  possible.  Thus,  size 
and  DEM  rating  levels  are  best  defined  by  probability  density  functions. 

The  problem  to  be  examined  in  this  report  is  now  stated:  using  the 
COCOMO  equations,  define  the  mean,  the  standard  deviation,  the  pdf,  and  the 
CDF  of  effort  for  a  multi-CSCI  project  in  terms  of  the  uncertainties  of  the 
CSCIs'  sizes  and  DEMs,  and  the  error  of  the  effort  estimating  model. 

The  theoretical  foundations  of  this  approach  are: 

a.  Law  of  Means:  the  mean  of  a  sum  of  random  variables  taken  from 
arbitrary  distributions  equals  the  sum  of  the  means  of  the 
variables; 

b.  Law  of  Variances:  the  variance  of  a  sum  of  independent  random 
variables  taken  from  arbitrary  distributions  equals  the  sum  of  the 
variances  of  the  variables; 

c.  Taylor  Series:  a  function  which  together  with  its  derivatives  is 
continuous  in  an  interval  may  be  exactly  described  everywhere 
within  that  interval  in  terms  of  the  values  of  the  variables,  the 
function,  and  its  derivatives,  which  are  evaluated  at  any  point 
within  the  interval; 
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d.  CLT :  the  pdf  and  CDF  of  a  sum  of  independent  random  variables 

converge  to  the  normal  as  the  variances  of  the  contributors  become 
small  compared  to  the  variance  of  the  sum;  the  forms  of  the  pdf 
and  CDF  of  the  sum  do  not  depend  on  the  forms  of  the  pdf  and  CDF 
of  the  contributors. 

The  independent  variables,  assumed  in  this  section,  are: 

a.  The  size  of  the  software  for  each  CSCI ,  measured  in  thousands  of 
delivered  source  instructions  (KDSI); 

b.  The  rating  levels  of  the  DEMs  chosen  for  each  CSCI; 

The  dependent  variable  is  development  effort,  measured  in  staff 
months  (SM). 


2 . 2  APPROACH 

The  approach  is  outlined: 

a.  An  expression  will  be  formed  which  defines  effort  in  terms  of  the 
DEMs  and  the  size  in  KDSI  of  the  CSCIs; 

b.  Truncated  Taylor  series  will  be  formed  to  define  effort  in  terms 
of  the  independent  variables; 

c.  The  derivatives  of  the  expressions  in  (a)  will  be  formed  and 
substituted  into  the  Taylor  series; 

d.  The  mean  and  variance  of  effort  will  then  be  calculated  under  the 
assumption  that  the  independent  variables  (size  and  DEMs)  are 
random  and  statistically  independent; 

e.  Finally,  by  invoking  the  CLT,  it  will  be  shown  that  the  pdf  and 
CDF  of  effort  are  normal  under  certain  conditions,  and  this  fact 
will  be  used  to  specify  the  probability  that  the  effort  estimate 
will  not  be  exceeded. 

The  procedure  outlined  above  is  now  set  forth  in  detail. 


2.2.1  Notation 

D  Development  effort  in  project,  SM 

D.  Development  effort  in  CSCI  i,  SM 
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N  Nominal  SM  of  effort  in  project 

N.  Nominal  SM  of  effort  in  CSCI  i 

I  Number  of  KDSI  in  project 

Ii  Number  of  KDSI  in  CSCI  i 

crl.  Standard  deviation  of  I. 

1  1 

M .  Value  of  DEM  k  in  CSCI  i 

l  k 

oM..  Standard  deviation  of  M  . 

1  k  i  k 

M.  Effort  adjustment  factor,  (product  of  all  DEMs),  in  CSCI  i 

The  independent  random  variables  are  I  and  Mik;  all  other  variables 
are  dependent.  The  range  of  the  index,  i,  is  over  all  CSCIs,  while  the 
range  of  the  index,  k,  is  over  all  DEMs. 


2.2.2  General  Relationships 

The  effort  is  now  defined  in  terms  of  the  sizes  of  the  CSCIs  and  the 
values  of  the  DEMs,  under  the  assumption  that  the  uncertainties  of  the 
system's  DEMs  and  size  are  defined  at  the  level  of  the  CSCIs  rather  than  at 
lower  levels.  This  assumption  in  no  way  limits  the  generality  of  the 
findings,  but  merely  reduces  the  complexity  of  the  notation  and  the 
derivations. 

Following  the  COCOMO  equations,  the  key  relationships  are: 

Total  size  of  project 


I  =  Z  I. 

i  i 


(2-1) 


Computation  of  nominal  SM 


N  =  alb  (2-2) 

where  the  parameters,  a  and  b,  are  presented  in  table  2-1,  see  page  117  of 
[1]  • 
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Table  2-1.  COCOMO  Parameters 


Development  Mode 

a 

b 

Organic 

3.2 

1.05 

Semi-detached 

3.0 

1.12 

Embedded 

2.8 

1.20 

Computation  of  Effort  Adjustment  Factor  for  CSCI  i,  see  page  125  of  [1J: 

(2-3) 


m  =  n  m 

l  k  l  k 


Distribution  of  nominal  SM  among  CSCIs 

N.  =  I.N/I  (2-4) 

The  nominal  effort  is  allocated  to  the  several  CSCIs  in  proportion  to 
their  sizes,  as  per  instructions  4-6  on  page  148  of  [1].  The  method  used 
above  in  (2-4)  for  computing  the  distribution  of  nominal  SM  avoids  the 
quantity  nominal  productivity  used  in  [1],  by  substituting  its  definition. 

Computation  of  development  effort  in  CSCI  i 

D.  =  N.M  (2-5) 

111  V  7 

Computation  of  project  total  effort: 

D  =  ED.  (2-6) 

i  1 


Equations  (2-1)  through  (2-6)  completely  define  the  foundations  of 
Intermediate  COCOMO  to  the  extent  required  for  the  uncertainty  analysis. 

As  stated,  effort  as  defined  by  D  in  (2-6)  is  the  dependent  variable,  while 
the  sizes  of  the  CSCIs  (I . )  in  (2-1)  and  the  DEMs'  values  given  as  M.k  in 
(2-3)  are  the  independent  variables.  These  independent  variables  are 
random  and  statistically  independent. 
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2.2.3  Taylor  Series 


A  Taylor  series  is  formed  for  effort  as  a  function  of  size,  and  of  the 
DEMs,  and  is  truncated  after  the  second  derivative  terms.  The  error  due  to 
this  truncation  is  demonstrated  in  section  4.  The  general  form  for  this 
truncated  multivariable  Taylor  series  is 


3z 


32z 


z  =  z„ 


+  P  —  (Xi-x  )  +  (1/2  r )  Z  f  - 

1  3x.  1  3  3x.  3x . 

i  i  i 


(x  -X  . ) (x  -X  .  .  ) 

'  l  i0/v  i  lO  ' 


(2-7) 


where  z  is  the  dependent  variable,  x.  and  x  are  the  independent  variables 
and  thus  may  be  either  size  or  DEMs,  and  the  subscript  0  implies  the 
multidimensional  point  about  which  the  series  is  expanded.  The  indices 
range  over  all  variables.  In  this  case,  the  series  is  to  be  expanded  about 
the  mean  values  of  the  independent  variables  (I.  and  M.k),  and  takes  the 
form 

3D  3D 

D  =  D  +  Z  —  (I.  -  I.  )  +  Z  Z  -  (M..-M ..) 

°  1  31.  1  1  1  k  3M  ,  lk  lk 


32D 

♦  (1/2!)  E  z - (r-i.)(i  -I  ) 

3  31 .  31 . 


(2-8) 


32D 

+  d/2!)  E  ?  Z  -  (I  -I.)(M  -M  ) 

1  3  k  31 .  3M  11  2k  2k 

i  jk 

32D 

* u/2!)  f  *  ••• 


Dq  is  the  usual  point  estimate  of  the  development  effort  based  on  mean 
sizes  and  mean  values  of  the  DEMs,  in  accordance  with  (2-1)  -  (2-6).  The 
partial  derivatives  are  to  be  evaluated  at  the  mean  values  of  size  and 
DEMs.  The  overbar  implies  the  mean  values  for  the  sizes,  I  or  I  ,  and  for 
the  DEMs,  H..  or  H.. ,  13 

'  ik  j  k 


We  now  may  formulate  expressions  for  the  mean  value  of  D  and  its 
variance,  and  evaluate  (2-8)  under  the  following  assumptions: 


a.  The  uncertainty  of  size  of  any  CSCI  is  independent  of  the 
uncertainty  of  size  of  the  other  CSCIs; 

b.  The  uncertainty  of  size  of  any  CSCI  is  independent  of  the 
uncertainties  of  the  DEMs  for  all  CSCIs; 
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c.  The  uncertainty  of  the  value  of  any  DEM  is  independent  of  the 
uncertainties  of  all  other  DEMS  so  that  all  the  random  variables 
are  mutually  statistically  independent. 

The  mean  value  of  (2-8)  is  determined.  The  three  terms  in  (2-7) 
expand  into  the  six  terms  of  (2-8)  due  to  the  presence  of  variables  of  two 
types  (I.  and  Mik);  it  will  be  shown  that  the  assumptions  noted  above  cause 
four  terms  to  vanish,  and  the  procedure  for  evaluating  the  other  two  will 
be  demonstrated.  Consider  the  first  term  on  the  right  of  (2-8);  as  stated 
above,  this  is  the  development  effort  point  estimate  determined  by  (2-1) 
through  (2-6)  evaluated  with  the  mean  values  of  size  and  DEMs. 

Now  consider  the  second  and  third  terms,  which  appear  on  the  first 
line  of  (2-8)  and  are  the  first  derivative  terms.  The  expected  value  of 
(Ii-I.)  is  zero  since  the  expected  value  of  Ii  is  I.;  the  expansion  is 
about  the  mean  of  the  variables,  I  ,  therefore  the  second  term  vanishes. 

By  the  same  reasoning,  the  expected  value  of  (Mik-Rik)  is  zero,  and  that 
term  vanishes  also. 


The  fourth  term  on  the  right  of  (2-8),  in  the  second  line,  is  reduced 
by  the  expectation  operator  and  assumption  (1),  above,  to  the  mean  value  of 

02D  32D 

(1/2)  E  - •( I.-l V,  which  is  (1/2)  E  - -  a2I 

1  31. 2  1  1  1  31. 2 

since  the  expected  value  of  the  product  is  zero  unless  i=j. 


Assumption  (2)  causes  the  fifth  term,  on  the  third  line  of  (2-8),  to 
vanish  in  the  presence  of  the  expectation  operator. 


The  sixth  term  on  the  right  of  (2-8)  is  evaluated  by  using  assumption 
(3)  and  the  discussion  for  the  fourth  term,  and  yields 

32D 

(i/2)  E  r  - r  a2Mik 

3M.  2 

l  k 

which  vanishes  as  the  second  and  higher  partial  derivatives  of  effort  with 
respect  to  M  are  identically  zero.  Third  and  higher  partial  derivatives 
of  effort  with  respect  to  I  are  neglected;  the  consequent  errors  will  be 
discussed  in  section  4. 


Combining 

effort 


the  results  developed  above  yields  the  estimate  of  the  mean 


32D 

D  =  D.  +  (1/2)  E  - ,  a2I. 

°  1  31  2 


(2-9) 
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The  second  term  of  (2-9)  is  relatively  small.  Its  value  will  be  demon¬ 
strated  in  the  example  shown  in  section  4.2.2;  see  also  column  10  of 
figure  5-6. 

The  variance  of  effort  is  now  determined.  The  assumptions  of  mutual 
independence  of  the  random  variables  will  be  used  in  conjunction  with  the 
properties  of  the  expectation  operator.  Subtract  (2-9)  from  (2-8);  this 
yields  an  expression  for  the  uncertainty  of  effort  about  the  mean.  Then 
square  and  take  the  expectation;  these  steps  yield  the  variance  of  effort 
as 


02D 


=  Z 


3D 


31 


o2I.  + 
1 


Z  Z 

i  k 


3D 


3M 


i  k 


02M 


i  k 


(2-10) 


In  principle,  the  variance  must  also  depend  on  all  the  odd  higher 
derivatives  with  respect  to  I.  and  M.k.  But  these  are  identically  zero  for 
M . k ,  while  the  derivatives  with  respect  to  I.  are  very  small.  The  examples 
in  section  4  will  show  the  error  of  this  approach  for  simple  cases. 


2.2.4  Derivatives 


The  following  partial  derivatives  are  required: 

3D  3D  32D 

31  3M  '  31.  2 

i  i  k  l 

Consider  first  the  partial  derivative  with  respect  to  the  size  of 
CSCI  i.  Relationships  (2-1)  -  (2-4)  enable  formation  of  this  partial 
derivative,  which  is  a  size-sensitivity  coefficient,  as 


3D 

-  =  [  (b-l)D0  +  NMi  ]/  ! 

31. 


(2-11) 


In  forming  this  derivative  it  is  essential  to  remember  that  It  is  included 
in  I,  the  total  size  of  the  project,  as  shown  in  (2-1). 


Partial  differentiation  of  (2-11)  with  respect  to  I.,  and  use  of  (2-1) 
through  (2-6)  and  (2-11),  yields  the  required  second  derivative  of  effort 
with  respect  to  size 


32D 


- -  =  (b-l)[(b-2)D0  +  2NMJ/r 


31 . 


(2-12) 
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Similarly,  the  derivative  of  effort  with  respect  to  a  DEM  is 


8D 


3M 


i  k 


n  n 

1  j  t  k 


D  /M 


1  k 


(2-13) 


As  stated  after  (2-7),  the  partial  derivatives  must  be  evaluated  with  mean 
sizes  and  mean  values  of  the  DEMs. 


Substitution  of  (2-12)  into  (2-9),  and  of  (2-11)  and  (2-13)  into 
(2-10),  yields  the  estimated  mean  development  effort  as 

5  =  D  +  (1/2)  Z  {(b-l)*{  (b-2)D  +  2NM  ]/I2}<j2I  (2-14) 

o  j  u  i  i 


and  the  variance  of  development  effort  as 

<r2D  =  Z  { [  (b-l)D0  +  N*M']/I}2  a2r  +  Z  Z  (D./M.k)2  a2M.k  (2-15) 


The  mean  effort  and  the  variance  of  effort  have  thus  been  defined  in 
terms  of  quantities  which  are,  or  can  be  made  to  be,  readily  available  in  a 
computational  format  for  Intermediate  C0C0M0. 


2.2.5  Cumulative  Distribution  Function  of  Effort 


The  procedures  described  above  enable  formation  of  the  variance  and 
mean  value  of  the  estimated  SM  of  effort  in  terms  of  the  means  and  vari¬ 
ances  of  size  of  the  CSCIs  and  of  the  DEMs.  In  addition  to  the  uncertain¬ 
ties  of  the  size  and  DEMS,  the  error  of  the  C0C0M0  model  may  be  included  in 
the  determination  of  the  variance  of  the  estimate.  The  CLT  justifies 
disregarding  whether  the  model's  errors  are  exactly  normally  distributed. 
The  errors  of  the  C0C0M0  model  are  independent  of  the  errors  due  to  uncer¬ 
tainties  of  size  or  of  DEMs,  and  therefore  the  variance  of  the  model's 
estimate  of  effort  may  be  added  to  those  due  to  size  and  DEMS  to  present  an 
overall  assessment  of  the  variance  of  effort.  The  CLT  is  now  used  to 
assert  that,  as  the  number  of  independent  random  variables  increases,  the 
pdf  and  CDF  of  their  sum  converge  to  the  normal  pdf  and  CDF,  which  are 
fully  determined  by  the  mean  and  variance  of  effort.  Any  packaged  program 
or  standard  table  of  the  normal  probability  integral  may  be  used  to 
calculate  the  CDF;  see  also  the  appendix. 
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SECTION  3 


PROBABILITY  DENSITY  FUNCTIONS  FOR  PARAMETER  UNCERTAINTIES 


It  was  assumed  in  section  2  that  the  mean  and  standard  deviation  for 
pdfs  appropriate  for  software  risk  analysis  are  available.  In  the  present 
section,  these  properties  will  be  derived  for  uniform,  triangular,  and  dis¬ 
crete  pdfs.  Discrete  pdfs  are  especially  appropriat  for  characterizing  the 
uncertainties  of  DEMs.  It  is  considered  that  more  sophisticated  pdfs  such 
as  beta  are  inappropriate  for  software  analysis  as  they  require  the  analyst 
to  have  great  insight  into  the  nature  of  the  uncertainties  of  size  or  of 
the  DEMs.  In  reality,  it  is  difficult  to  select  reasonable  upper  and  lower 
bounds  for  a  uniform  distribution,  which  has  only  two  parameters!  Moreover, 
the  effect  of  the  CLT  eliminates  the  significance  of  fine  distinctions  in 
the  forms  of  assumed  pdfs.  The  validity  of  the  very  simple  geometric  forms 
for  analysis  using  the  CLT  will  be  demonstrated  in  section  5. 

The  use  of  Taylor  series  in  section  2  requires  that  the  independent 
variables  be  continuous.  This  is  not  strictly  true  for  size,  since  the 
size  of  a  piece  of  code  is  defined  in  lines  of  code,  which  is  an  integer 
quantity;  it  is  hard  to  conceive  of  a  fraction  of  a  line  of  code.  Nonethe¬ 
less,  as  the  minimum  increment  of  one  line  is  0.001  KDSI,  it  is  reasonable 
to  assume  that  effort  is  a  continuous  function  of  the  continuous  variable 
of  size.  The  DEMs  are  fundamentally  continuous  variables;  however,  they  are 
known  only  at  the  specific  tabulated  points  and  therefore  a  discrete  pdf 
structure  is  useful. 


3.1  UNIFORM  DISTRIBUTION 

A  uniform  probability  density  function  has  the  definition 

(  0,  x  <  L 

f ( x )  =  {  1/(H-L) ,  L  <  x  <  H  (3-1) 

l  0,  H  <  x 

where  L  Minimum  value  of  x 

H  Maximum  value  of  x 

x  Statistical  variable 

The  mean  of  this  pdf  is  easily  found  from  the  geometry  to  be 

x  =  (H-L)/2  +  L  =  (H+L)/2  (3-2) 


11 


The  variance  is 


a2  =  Jx2f(x)dx  -  (  x  )2  =  (H-L)2/12 
L 


3.2  TRIANGULAR  DISTRIBUTION 

The  mean  and  variance  of  a  general  triangular  pdf  are  derived 
3-1  shows  a  triangular  pdf  and  its  notation. 


Notation 

f  Probability  density 
L  Minimum  value  of  x 
M  Most  probable  value  of  x,  mode 
H  Maximum  value  of  x 

x  Statistical  variable,  thousands  of  lines  of  code 

The  mean  is  derived  according  to  the  rule, 

®  M  H 

x  =  Jx  f(x)  dx  =  Jx  f  (x)  dx  +  Jx  f+(x)  dx  =  (L+M+H)/3 
-®  L  M 


where  x  is  the  mean. 


The  variance  is  calculated  according  to  the  rule 


2 


a 


00 

=  J(x-x)2  f(x)  dx 

_  CD 


M 

I(x-x)2fjx)  dx 
L 


H 

J ( x— x ) 2  f  (x)  dx 
M 


=  U(L2+M2+H2)-(LM+LH+MH)]/18}  (3-6) 

A  right-triangle  pdf  is  sometimes  used  in  software  size  analysis.  The 

mean  and  variance  for  this  special  form  are,  if  L=M  is  assumed, 

x  =  (2L+H)/3  (3-7) 

and 

a  =  (H-L)2/18  (3-8) 


3.3  DISCRETE  DISTRIBUTION 

A  discrete  distribution  is  occasionally  realistic  for  size  and  is 
usually  ideal  for  DEMs,  as  they  are  given  only  for  specific  values  in  Jl]. 
For  example:  the  probability  of  DEM  rating  1  is  Px ,  of  rating  2  is  P2 ,  of 
rating  3  is  P3 ,  etc.,  where  the  sum  of  the  Pk  must  equal  unity.  Then, 
defining 

xk  The  rating  values  tabulated  in  [1]  for  a  DEM 
Pk  The  probability  of  those  rating  values  for  that  DEM 

the  mean  value  of  the  DEM  is 


x  =  ExkPk  (3-9) 

and  the  variance  of  the  DEM  is 

a2  =  E(xk-x)2Pk  (3-10) 
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SECTION  4 


ACCURACY  AND  VALIDITY 


This  section  is  devoted  to  resolving  two  important  questions: 

a.  Are  the  means  and  variances  of  effort  determined  with  accuracy 
sufficient  to  be  confidently  used? 

b.  Under  what  conditions  is  the  CDF  derived  by  this  procedure 
sufficiently  accurate? 

These  questions  are  now  considered  in  the  order  set  forth  above. 


4.1  ACCURACY  OF  MEANS  AND  VARIANCES 

The  accuracy  with  which  the  Taylor  series  method  determines  the  mean 
and  variance  of  a  multi-CSCI  software  system  is  considered.  The  approach 
is  outlined.  There  exists  a  method,  elegantly  presented  in  [4]  for  uniform 
and  triangular  pdfs  of  size,  which  enables  exact  calculation  of  the  mean 
and  variance  of  effort  for  a  single  CSCI .  The  mean  and  variance  of  a 
single  CSCI  determined  by  the  Taylor  series  method  will  therefore  be 
compared  to  the  exact  values.  The  following  rules  will  then  be  used  to 
infer  that  the  results  may  be  extended  to  the  multi-CSCI  case: 

a.  The  mean  of  a  sum  of  random  variables  equals  the  sum  of  the  means 
of  those  variables; 

b.  The  variance  of  a  sum  of  independent  random  variables  equals  the 
sum  of  the  variances  of  the  variables. 

These  rules  are  valid  in  general. 

The  mean  and  standard  deviation  of  several  uniform  pdfs  are  calculated 
by  the  Taylor  series  and  exact  methods  and  are  compared  in.  table  4-1, 
where  the  COCO MO  Embedded  Mode  values  a=2.8  and  b= 1.2  have  been  used;  this 
choice  of  the  parameter,  b,  exhibits  the  maximum  error  for  the  Taylor 
series  method  presented  in  this  report.  The  errors  of  the  mean  and 
standard  deviation  are  due  to  the  truncation  of  the  Taylor  series. 
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Table  4-1.  Mean  and  Standard  Deviation  of  Effort 
for  Several  Uniform  pdfs  of  Size 


Size-Range 

Mean  Effort 

Effort  Standard  Deviation 

Low/High 

KDSI 

Taylor 

Series 

Exact 

Method 

Taylor 

Series 

Exact 

Method 

4/16 

45.02 

45.03 

18.4 

18.3 

16/64 

237.6 

237.7 

97.4 

96.8 

64/256 

1254 

1255 

514 

511 

256/1024 

6619 

6622 

2712 

2697 

The  largest  error  in  the  values  of  the  means  is  less  than  one  part  per 
thousand,  while  the  largest  error  in  the  standard  deviations  is  less  than 
six  parts  per  thousand.  These  errors  are  considered  entirely  satisfactory, 
as  they  are  several  orders  of  magnitude  smaller  than  the  errors  of  the 
COCOMO  model.  These  errors  are  due  to  the  truncation  of  the  Taylor  series. 

Similar  results,  shown  in  table  4-2,  are  found  for  a  group  of  various 
triangular  pdfs  of  size.  The  mean  and  standard  deviation  of  effort  for  the 
exact  and  Taylor  series  methods  again  compare  closely.  The  differences  are 
small  compared  to  the  COCOMO  model  error,  with  standard  deviation  of 
approximately  15  to  20  percent  of  the  point  estimate  of  effort. 


Table  4-2.  Mean  and  Standard  Deviation  of  Effort 
for  Triangular  pdfs  of  Size 


Size-Range 

Mean 

Effort 

Standard  Deviation 

of  Effort 

Low/Mode/High 

KDSI 

Taylor 

Series 

Exact 

Method 

Taylor 

Series 

Exact 

Method 

16/16/64 

181.9 

181.8 

76.0 

77.1 

16/24/64 

199.4 

199.4 

71.7 

72.4 

16/32/64 

217.5 

217.5 

69.2 

69.4 

16/40/64 

235.91 

235.94 

68 .  b 

68.6 

The  combined  effects  of  skew  and  truncation  of  the  Taylor  series  appear  in 
the  first  line  of  this  table,  where  the  standard  deviations  differ  by  1.4 
percent.  The  bottom  row  shows  the  effects  of  truncation  only.  The  results 
presented  in  tables  4-1  and  4-2  show  that  the  Taylor  series  method  enables 
calculation  of  the  mean  and  standard  deviation  (or  variance)  of  effort  to 
an  acceptable  degree  of  accuracy  for  uniform  or  triangular  pdfs. 
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4.2  CONDITIONS  FOR  CALCULATING  THE  CDF 


It  will  now  be  demonstrated  that  the  CDF  of  a  multi-CSCI  system  can  be 
determined  to  a  sufficient  degree  of  accuracy  by  combining  the  Taylor 
series  method  with  the  CLT.  The  CLT  shows  that  the  CDF  of  a  sum  of 
independent  random  variables  converges  to  the  normal  CDF  as  the  number  of 
such  variables  increases,  without  regard  to  the  forms  of  the  contributing 
pdfs.  The  approach  which  will  be  used  to  determine  the  conditions  under 
which  the  CLT  may  be  used  in  the  present  context  is  to  show  that  under 
various  worst-case  conditions  the  following  conjecture  is  true: 

The  CDF  of  effort  may  be  calculated  under  the  assumption  that  it 
is  normal  whenever  the  largest  contributing  variance  of  size  or 
of  the  DEMs  does  not  exceed  half  the  total  variance  of  the 
effort . 

As  stated,  the  contributing  variances  are  those  of  size  and  of  the  DEMs  in 
the  various  CSCIs;  the  error  of  the  model  is  excluded.  This  condition  is 
viewed  as  sufficient  but  not  necessary.  The  conjecture  will  be  supported 
by  examples  which  satisfy  the  condition  cited  above.  These  examples  are: 

The  variables  come  from  two  identical,  independent,  pdfs  of  size 
which  are  either  uniform  or  right-triangular .  The  DEMs  are 
assumed  to  be  nominal,  and  the  error  of  the  COCOMO  model  is 
neglected. 

In  these  examples,  the  two  contributing  variances  are  equal,  and  the  con¬ 
dition  for  the  conjecture  is  marginally  satisfied.  Neglecting  the  approxi¬ 
mately  normal  error  of  the  model  is  highly  conservative. 

The  criterion  for  acceptance  of  the  Taylor  series  model  of  CDF  is 
stated:  this  model  will  be  considered  acceptably  accurate  if  its  worst 

error  of  estimated  effort  is  less  than  5  percent  of  the  point  estimate  of 
effort.  In  one  particular  area  of  interest,  the  range  from  60  to  90 
percentile,  the  Taylor  series  model  is  accurate  within  2  percent  or  less, 
an  error  which  can  be  considered  negligible  for  cost-estimating  purposes  in 
the  planning  phase. 
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4.2.1  A  Sum  from  Two  Uniform  pdfs 


Assume  two  identical  uniform  pdfs,  fj  and  f2,  defined  by 

f  0,  x  <  L 

f.  =  1/(H  -L  ),  L  <  xl<  H.  (4-1) 

l  0,  H.  <  x 

where  i  =  1,  2.  Also  assume  these  pdfs  are  identical  so  that  L  =  L  =  L, 
and  Hj  =  H?  =  H,  and  R  =  H  -  Lj  =  H2  -  L?  =  H  -  L.  The  pdf  of  the  sum  (S) 
of  the  variates  is,  see  [3],  the  convolution  integral 


f(S)  =  J  f,(x)  f2(S-x)  dx 


(4-2) 


Using  the  definitions  of  fL  and  f2  and  (4-3)  yields  the  pdf  of  the 
sum,  S,  as 


'  0, 

s 


f(S)  = 


Jdx/R2  =  [S-(L1+L2)]/R2 , 


Li+L2 


H  +H 

Jdx/R2  =  [H1+H2-SJ/R2 , 
S 


0, 


s  <  l1+l2 


l1+l2  <  s  <  l1+l2+r 


l1+l2+r  <  s  <  h1+h2 


h1+h2  <  s 


(4-3) 


This  triangular  pdf  is  symmetrical  about  the  value  S=L1+L2+R  =  H+L. 

Assume  that  this  symmetrical  triangular  pdf  of  size  has  Low=16  KDSI, 
Mode=40  KDSI,  and  High=64  KDSI,  and  use  the  C0C0M0  parameters  a=2.8,  and 
b=1.2.  The  mean  and  standard  deviation  of  effort  for  this  case  are 
presented  in  table  4-2.  These  estimated  values  are  now  used  to  form  the 
CDF,  using  the  assumption  that  the  distribution  is  normal,  consistent  with 
use  of  the  CLT.  This  normal  CDF  is  compared  in  table  4-3,  below,  with  the 
exact  CDF  for  the  symmetrical  triangular  pdf  defined  by  (4-3),  calculated 
by  the  method  of  [4]. 


The  difference  of  greatest  magnitude  (*)  is  0.017,  which  corresponds 
to  an  error  of  effort  of  5  SM.  This  error  of  effort  is  negligibly  small 
compared  to  the  selected  criterion. 


Uniform  pdfs  are  symmetrical,  and  the  pdf  of  the  sum  of  two  identical 
uniform  pdfs  is  a  symmetrical  triangle.  It  is  not  surprising  that  such  a 
sum  has  nearly  normal  distribution.  Right-triangular  pdfs  are,  however,  at 
the  extreme  of  asymmetry.  This  case  is  now  examined. 


Table  4-3.  Normal  CDF,  True  CDF,  and  Difference  for  a 
Symmetrical  Triangular  pdf,  16K  to  64K.  Size 


Effort 

SM 

Normal 

CDF 

True 

CDF 

Difference 

70 

0.008 

0.000 

0.008 

78.001(Low) 

0.011 

0.000 

0.011 

100 

0.024 

0.012 

0.012 

125 

0.054 

0.051 

0.003 

150 

0.106 

0.117 

-  0.011 

175 

0.188 

0.205 

-  0.017  * 

200 

0.301 

0.316 

-  0.015 

235. 908(Mean) 

0.500 

0.510 

-  0.010 

250 

0.581 

0.589 

-  0.008 

275 

0.715 

0.710 

0.005 

300 

0.824 

0.809 

0.015 

325 

0.902 

0.886 

0.016 

350 

0.951 

0.943 

0.008 

375 

0.978 

0.980 

-  0.002 

400 

0.991 

0.998 

-  0.007 

411 . 6935(High) 

0.995 

1.000 

-  0.005 

425 

0.997 

1.000 

-  0.003 

4.2.2  A  Sum  from  Two  Right-Triangular  pdfs 


Asssume  two  identical  right-triangular  pdfs, 
generality  on  the  interval  0  <  x  <1,  as  shown  in 


defined  without  loss  of 
figure  4-1  below. 


Figure  4-1.  A  Right-Triangular  Probability 
Density  Function 
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The  pdf  of  the  sum  of  the  two  variates  is 


f  (S) 


0,  S  <  0 

s 

4 /( 1-x) ( l-S+x)dx=2S(6-6S+S2 )/3,  0  <  S  <  1 

0 

1 

4/(l-x)(l-S+x)dx=  2(8-12S+6S2-?V3,  1  <  S  <  2 


S-l 

,0,  2  <  S 

As  before,  integration  of  (4-4)  yields  the  CDF  of  the  sum  as 


F(S) 


0, 

S2(S2-8S+12)/6, 

( 32S-24S2  +8S3 -S4 -10)/6, 

1, 


0  <  s 

0  <  S  <  1 

1  <  S  <  2 

2  <  S 


(4-4) 


(4-5) 


Following  the  procedure  of  [41,  (4-5)  may  be  transformed  to  the 
required  form  by  the  substitutions 

S=[t-(L1+L2)]/R,  R=H-L,  and  (D/a)1/b=t  (4-6) 

where,  using  the  notation  of  section  2,  a  =  a(  £  I.  II  Mi;.)/i  (4-7) 

The  CDF  of  the  sum  is  calculated  as  follows: 

a.  Calculate  a  from  (4-7)  and  then  select  a  value  for  D; 

b.  Using  (4-6),  calculate  t  and  then  S; 

c.  Given  S,  calculate  the  CDF  from  (4-5). 

The  cumulative  probability  distribution  thus  generated  is  compared  to 
the  normal  CDF  in  table  4-4.  The  mean  and  variance  used  for  calculating 
the  normal  CDF  are  calculated  here. 

The  size  data  for  the  assumed  right-tr iangular  pdfs  are: 

Low=L1=L2=16  KDSI,  High=H1=H2=64  KDSI. 
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Mean  sizes  of  the  pdfs  are,  from  (4-4): 


Ij  =I?  =  (16+16+64)/3=32  KDSI. 

Size  standard  deviations  are,  from  (4-6): 

crl :  =  crl 2  =  { [  (162  +162+642  )-( 16*16+16*64+16*64)  ] /18) 1/2  =  8/2. 


The  DEMs  have  been  assumed  nominal  and  without  uncertainty  so  that 
their  products  are  unity  and  their  variances  are  zero;  then  the  point 
estimate  D0=R  as  a  consequence.  The  point  estimate  of  effort  is 

DQ =2. 8(32+32)* ' 2 =41 1.693  SM 

The  first  and  second  derivatives  with  respect  to  size  are,  from 
(2-11) 


3D  3D 

—  =  —  =  {[(1.2-1)411. 693+411 . 693 ] /64}=7 . 719  SM/KDSI 

31 1  9I2 

and,  from  (2-12) 

32D  32D  (1.2-1)1(1.2-2)411.693+2*411.693] 


V  31 2  2 


=  0.024 


(64*64) 

The  estimated  mean  effort  is,  from  (2-14) 

D  =  411. 69+0. 024((8/2)2+(8/2)2]/2=411. 69+3. 07=414. 8 
The  exact  mean  effort  is  421.0  SM,  from  [4]. 

The  estimated  standard  deviation  of  effort  is,  from  (2-15), 
[(1.2-1)411.69+411.69] 


ao  = 


64 


[  (8/2)2  +  (8/2)2  ]1/z  =  123.5 


The  exact  standard  deviation  is  125.1,  from  [4]. 

The  estimated  mean  and  standard  deviation  of  effort  are  used  to 
calculate  the  Taylor  series  model's  CDF,  which  is  normal  under  the  CLT 
hypothesis;  this  CDF  is  presented  in  table  4-4,  below.  The  exact  CDF  is 
also  presented,  together  with  the  error  of  probability  between  the  two  CDFs 
at  the  same  effort. 
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Table  4-4.  Exact  and  Normal  Cumulative  Probability 
Distributions  of  Effort  for  the  Sum  of 
Variates  from  Two  Right-Triangular  pdfs 


No.  of 

Std.  Devs. 

Effort 

Normal  CDF 

Exact  CDF 

Difference 

-1.98 

170 

0.0237 

0 

0.0237 

-  1.91 

179.2 

0.0282 

0.0000 

0.0282 

-1.74 

200 

0.0410 

0.0078 

0.0332 

-1.54 

225 

0.0622 

0.0352 

0.0270 

-  1.33 

250 

0.0911 

0.0783 

0.0128 

-  1.13 

275 

0.1289 

0.1334 

-0.0045 

-  0.93 

300 

0.1764 

0.1975 

-0.0211 

-  0.73 

325 

0.2336 

0.2676 

-0.0240 

-  0.52 

350 

0.2999 

0.3415 

-0.0416 

-  0.32 

375 

0.3737 

0.4170 

-0.0433  * 

-  0.12 

400 

0.4524 

0.4922 

-0.0398 

0.083 

425 

0.5330 

0.5653 

-0.0323 

0.29 

450 

0.6122 

0.6350 

-0.0228 

0.49 

475 

0.6871 

0.6997 

-0.0126 

0.69 

500 

0.7549 

0.7582 

0.0033 

0.89 

525 

0.8139 

0.8096 

0.0043 

1.09 

550 

0.8632 

0.8528 

0.0104 

1.30 

575 

0.9027 

0.8881 

0.0146 

1.50 

600 

0.9331 

0.9163 

0.0168 

1.70 

625 

0.9556 

0.9388 

0.0168 

1.90 

650 

0.9716 

0.9562 

0.0154 

The  difference  is  relatively  small  everywhere  except  in  the  region 
from  0.52  to  0.12  standard  deviations  below  the  mean  effort;  the  difference 
of  greatest  magnitude  is  marked  by  a  *.  This  region  is  of  relatively 
little  interest,  for  the  principal  concern  of  a  project  manager  is  usually 
in  the  over-run  range  of  cumulative  probabilities  from  70  to  90  percent. 

The  differences,  in  column  5  of  table  4-4,  are  the  errors  of  the 
cumulative  probability  distribution  function  as  a  function  of  effort. 

These  errors  may  be  converted  to  errors  of  the  estimate  of  effort  as  a 
function  of  the  cumulative  probability;  this  structure  of  errors  is 
presented  in  table  4-5,  below. 
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Table  4-5.  Errors  of  Estimated  Effort 


Probability  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9 

Error  of  Effort  +6  -10  -15  -13  -11  -8  -4  -2  +9 


The  maximum  error  of  effort,  in  table  4-5,  is  -15  SM;  this  error  is 
less  than  five  percent  of  the  point  estimate  (411  SM),  and  is  also  less 
than  1/3  of  the  approximately  62  SM  error  of  the  C0C0M0  model.  Further, 
the  errors  in  the  region  of  interest  from  50  to  90  percent  probability  are 
even  smaller.  Therefore,  even  in  this  extreme  case  the  error  of  the  Taylor 
series  model  is  acceptably  small  compared  to  the  selected  criterion. 

The  accuracy  of  this  method  improves  as  the  number  of  error  sources 
increases . 


4 . 3  INFERENCES 

It  has  been  shown  that  the  sum  of  variables  from  either  two  uniform, 
or  two  right-triangular,  identical  pdfs  yields  a  CDF  which  is  sufficiently 
close  to  normal  to  be  accepted  as  such,  under  the  selection  criterion 
proposed  above.  It  will  now  be  shown  that  the  CLT  may  be  used  to  extend 
the  examples  to  the  more  general  situation  of  more  than  two  contributing 
variances . 

Assume  that  the  variance  of  the  system  is  comprised  of  two  large  and 
equal  variances,  plus  one  or  more  smaller  contributors.  The  effect  of 
these  smaller  contributors  is  to  make  the  pdf  and  CDF  of  the  overall  system 
closer  to  normal,  by  virtue  of  the  action  of  the  CLT.  Alternately,  assume 
that  instead  of  one  or  more  small  contrib ‘tors  there  is  only  the  error  of 
the  C0C0M0  model  itself.  But  this  error  is,  on  inspection,  nearly  normal, 
taking  into  account  that  there  are  only  63  cases  in  the  C0C0M0  database, 
and  therefore  the  CDF  of  the  sum  of  the  two  large  contributors  and  the 
error  of  the  model  is  more  nearly  normally  distributed  than  merely  the  sum 
of  the  two  principal  contributors. 

The  sum  of  variables  from  two  general  triangles  with  the  same 
size-range  cannot  be  further  from  normal  than  the  sum  from  two  identical 
right-triangular  pdfs.  Therefore,  the  example  considered  above  is  a  worst 
case;  any  other  combination  of  triangles  must  be  more  nearly  normal. 
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It  is  therefore  concluded  that  the  conjecture  proposed  at  the 
beginning  of  this  section  is  demonstrated.  The  conclusions  are: 

a.  The  Taylor  series  method  may  legitimately  be  used  in  all  cases 
to  determine  the  mean  and  standard  deviation  of  effort; 

b.  The  Taylor  series/CLT  method  may  be  used  to  find  the  cumulative 
distribution  of  effort  whenever  the  largest  contributing 
variance  due  to  size  or  to  a  DEM  does  not  exceed  half  of  the  total 
variance  of  effort. 

It  should  be  observed  that  the  criterion  stated  above  is  not  valid  in 
general.  However,  it  is  at  present  valid  in  software  effort  analysis 
because  the  error  of  software  effort  estimating  models  is  large.  If  a 
software  effort  estimating  model  should  appear  with  much  smaller  errors, 
then  this  method  of  effort  uncertainty  analysis,  and  the  validity  criterion 
which  was  offered,  must  be  reconsidered. 

P.  R.  Garvey  asserts  that  his  method  [4 ]  for  calculating  the  effects 
of  uncertainties  of  size  can  be  extended  from  the  single-CSCI  case,  for 
which  it  is  exact,  to  the  multiple-CSCI  case  by  use  of  the  CLT.  The 
analysis,  above,  in  this  section,  implies  that  validity  of  his  assertion 
requires  satisfying  the  same  criteria  used  here  for  Taylor  series.  His 
assertion  is  therefore  justified,  and  his  approach  may  be  used  for  the  size 
aspects  of  a  multiple-CSCI  case. 


4.4  SUGGESTIONS  FOR  THE  SOFTWARE  ANALYST 

Some  suggestions  for  the  software  cost  analyst  are  offered. 

Size:  The  uncertainty  of  size  of  a  software  project  is  primarily  at 
the  CSCI  level,  and  is  related  to  the  number,  rather  than  the 
size,  of  modules  at  the  lowest  level  of  the  hierarchy  of  com¬ 
ponents.  Triangular  or  uniform  pdfs  are  reasonable  ways  to 
describe  size  uncertainty. 

DEMs:  At  least  some  DEMs  may  reasonably  be  assumed  to  have  uncer¬ 
tainties.  The  structure  of  the  algorithms  which  have  been 
developed  enables  setting  the  uncertainty  of  a  DEM  directly  as, 
for  example,  "standard  deviation  for  DEM  36  =  3  percent,  i.e., 
0.03."  This  follows  from  the  definition  of  DEMs  as  multipliers 
with  nominal  value  of  unity;  see  also  the  further  discussion  of 
this  point  in  section  5.  The  analyst  may  therefore  express  an 
opinion  of  the  uncertainty  of  selection  of  the  DEM  as  a  frac¬ 
tion  of  the  DEM's  value,  or  may  use  the  discrete  pdf  approach 
outlined  in  section  4.3;  the  latter  is  recommended. 
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SECTION  5 


EXAMPLE 


An  example  of  the  computation  procedure  is  presented  in  this  section. 

A  system  comprised  of  three  CSCIs  is  assumed.  The  sizes  of  the  CSCIs,  and 
the  values  of  three  DEMs  in  each  CSCI,  are  assumed  to  be  uncertain.  The 
mean  effort  and  the  standard  deviation  of  effort  will  be  evaluated,  and  the 
normal  probability  cumulative  distribution  function  which  results  will  be 
determined . 

The  procedure  is  conducted  by  using  work  sheets  which  are  specifically 
designed  for  Intermediate  COCOMO. 

The  computation  work  sheets  are  of  two  distinct  types.  The  first  type 
is  used  to  compute  the  mean  values  of  the  DEMs,  their  standard  deviations, 
and  the  other  related  quantities.  One  of  these  forms  is  filled  out  for 
each  CSCI.  The  second  form  is  used  to  calculate  the  mean  and  standard 
deviation  of  effort,  including  the  effects  of  size  and  of  DEMs;  this  form 
is  filled  out  once  for  the  project. 


5 . 1  DATA 

The  data  for  the  example  are  gathered  here.  Table  5-1  presents  the 
assumed  uncertainties  of  size  for  the  three  CSCIs. 


Table  5-1.  Assumed  Size  Uncertainties 


CSCI  # 

Type  of  Distribution 

Low-Size 

L 

Mode-Size 

M 

High-Size 

H 

Triangular 

16 

32 

64 

Right-Triangular 

40 

40 

80 

Uniform 

40 

80 

No 

assumed 


value  is  entered  in  table  5-1  for  the  mode  for  CSCI  3  as  its  pdf  is 
to  be  uniform.  The  data  from  table  5-1  will  be  used  in  figure  5-6. 
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Table  5-2  presents  the  assumed  uncertainties  for  three  DEMs  in  each  of 
the  three  CSCIs.  DEMs  not  entered  in  table  5-2  are  assumed  to  be  nominal, 
with  value  unity  and  no  uncertainty.  Consistent  with  the  prior  discus¬ 
sions,  the  DEMs'  uncertainties  are  described  by  discrete  distributions. 


Table  5-2.  Assumed  Uncertainties  of  DEMs 


# 

CSCI 

Name 

Very  Low 

Low 

DEM  Probabilities 

Nom  High  Very  High 

Extra  High 

1 

CPLX 

0.05 

0.10 

0.25 

0.35 

0.20 

0.05 

TIME 

0.50 

0.50 

ACAP 

0.10 

0.30 

0.50 

0.10 

2 

DATA 

0.70 

0.30 

STOR 

0.60 

0.40 

PCAP 

0.50 

0.50 

3 

RELY 

0.50 

0.50 

AEXP 

0.25 

0.50 

0.25 

LEXP 

0.20 

0.70 

0.10 

These  data  will  be  used  in  figures  5-2  through  5-4. 


5.2  REORGANIZATION  OF  MEAN  AND  VARIANCE  FORMULATIONS 

It  is  useful,  for  computational  purposes,  to  reorganize  the  equations 
for  mean  effort  and  its  standard  deviation.  The  modified  forms  are 
presented  here;  it  will  be  observed  that  the  changes  are: 

Size:  The  first  partial  derivative  with  respect  to  size  is  premul¬ 
tiplied  by  mean  size,  and  the  standard  deviation  of  size  is 
divided  by  the  mean  size;  with  the  second  partials,  the  square 
of  size  is  used. 

DEM:  The  partial  derivative  with  respect  to  a  DEM  is  multiplied  by 

that  DEM's  mean  value,  and  the  standard  deviation  of  the  DEM  is 
divided  thereby. 
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These  changes  are  thus  only  of  form.  The  equations  for  mean  effort  and 


variance  of  effort  become 
Mean  effort: 

D  =  D0  +  f  (1/2) {(b-1) [ (b-2)D0  +  2NM.  ]}2(oI./I)2  (5-1) 

Variance  of  effort: 

a2D  =  E  [ (b-l)D„ +NM. ]2(al./i)2  +  Z  Z  D  2 ( aM  ./M.  )2  (5-2) 

The  inner  summation  of  the  second  term  of  (5-2)  may  be  simplified  as 

(oM./M. )2  =  Z  (oMi  /M.  )2  (5-3) 

The  notation  is  simplified  by  defining 

S.  =  (b-l)D0+NM.  (5-4) 

T.  =  (b-l)[(b-2)D0  +  2NM. }  (5-5) 


The  equations  for  calculating  the  mean  and  variances  of  uniform, 
triangular,  and  discrete  pdfs  are  gathered  here  for  convenience. 

Means 


Uniform  pdf 

x  =  (L  +  H)/2 

(5-6) 

Triangular  pdf 

x  =  (L  +  M  +  H)/3 

(5-7) 

Discrete  pdf 

x  =  E  x.P. 

k  *  * 

(5-8) 

Variances 

Uniform  pdf 

a  =  (H  -  L)2/12 

(5-9) 

Triangular  pdf 

a  =  |(L2+M2+H2)-(LM+LH+MH)1/18 

(5-10) 

Discrete  pdf 

°2  =  ^k-VX 

(5-11) 

where  xk  are  the  values  of  the  ratings  of  a  DEM,  and  Pk  are  the  proba¬ 
bilities  of  these  ratings. 
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5.3  MEAN  AND  STANDARD  DEVIATION  OF  DEMs 


Computation  of  the  mean,  standard  deviation,  and  CDF  of  effort 
requires  computation  of  the  means  and  standard  deviations  of  the  DEMs. 
Figures  5-1  through  5-4  present  the  computation  of  the  mean  ar.d  standard 
deviation  of  effort  due  to  the  uncertainty  of  the  DEMs.  Figure  5-1  is  a 
detailed  list  of  instructions,  while  figures  5-2,  5-3,  and  5-4  present  the 
actual  calculations  and  data  for  the  DEMs  of  the  three  CSCIs,  consistent 
with  the  data  in  table  5-2  and  the  second  term  in  (5-2),  or  the 
reformulation  in  (5-3). 

The  output  from  this  stage  of  the  computation  appears  at  the  bottom  of 
columns  13,  14,  and  15  on  figures  5-2  through  5-4.  These  outputs  will  be 
used  in  computing  the  standard  deviation  of  effort  in  figures  5-5  and  5-6, 
which  will  be  discussed  next. 


5.4  MEAN  AND  STANDARD  DEVIATION  OF  EFFORT 

Computation  of  the  mean  and  standard  deviation  of  effort,  and  its  CDF, 
requires  computation  of  the  means  and  standard  deviations  of  size  of  the 
CSCIs.  Figure  5-5  is  the  detailed  instruction  set  for  using  figure  5-6. 
Figure  5-6  is  used  for  calculating  the  point  estimate  of  effort,  the 
estimated  mean  effort,  and  the  standard  deviation  of  effort.  The  size  data 
from  table  5-1  are  entered  into  columns  1,  2,  and  3.  The  result  of  the 
computations  on  the  DEMs,  from  figures  5-2,  5-3,  and  5-4,  are  entered  into 
column  6.  Figure  5-5  presents  the  procedure  for  using  this  form.  Any 
standard  table  of  the  normal  probability  function  may  be  used  to  find  the 
CDF;  alternately,  see  a  procedure  in  [5], 


5.5  CUMULATIVE  DISTRIBUTION  FUNCTION  OF  EFFORT 

Given  the  mean  effort,  its  standard  deviation,  and  the  assumption  that 
the  distribution  is  normal,  the  CDF  is  easily  calculated  by  using  a  stan¬ 
dard  table  of  the  normal  probability  function.  This  table  is  entered  with 
x,  the  number  of  standard  deviations  between  any  value  of  effort,  D' ,  and 
the  mean  effort,  and  is  defined  as 

x  =  (D'  -  5  )/aD  (5-12) 

Table  5-3  presents  the  probability  cumulative  distribution  function, 
CDF,  which  follows  from  the  results  of  the  computations  in  figure  5-6;  that 
figure  shows  the  mean  effort  to  be  1412  SM,  and  the  standard  deviation  of 
effort  to  be  322.9  SM. 
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Figure  5-1.  Instructions  for  Calculating  Mean  and  Standard  Deviation  of  DEMs 


Embed. 


Figure  5-2.  Calculation  of  Mean  and  Standard  Deviation  of  DEMs  in  CSCI 


Coeffs:  a-  2.&  ,  b=  1.20 


CM 
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Figure  5-3.  Calculation  of  Mean  and  Standard  Deviation  of  DEMs  in  CSCI 


Embed. 
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Calculation  of  Mean  and  Standard  Deviation  of  DEMs  in  CSC1 
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Figure  5-5.  Instructions  for  Computation  of  Mean  and  Standard  Deviation  of  Effort 


Size  Data  Mean  Size  &  Effort  Std.  Dev.  of  Elf.  due  to  Size  &  DEMs 
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Figure  5-6.  Calculation  of  Mean  and  Standard  Deviation  of  Effort 


Table  5-3.  Cumulative  Probability  vs.  Effort 


Standard 
Deviations,  x 

Effort , 

SM 

Cumulative 

Probability, 

Percent 

1000 

10.1 

-1.12 

1050 

13.1 

-0.97 

1100 

16.7 

-0.81 

1150 

20.9 

-0.66 

1200 

25.6 

-0.50 

1250 

30.8 

-0.35 

1300 

36.4 

-0.19 

1350 

42.4 

-0.04 

1400 

48.5 

0.00 

1412 

50.0 

0.27 

1450 

54.7 

0.27 

1500 

60.7 

0.43 

1550 

66.5 

0.58 

1600 

72.0 

0.74 

1650 

77.0 

0.89 

■•700 

81.4 

1.05 

1750 

85.2 

1.20 

1800 

88.5 

1.36 

1850 

91.3 

1.51 

1900 

93.5 

1.67 

1950 

95.2 

1.82 

2000 

96.6 

Thus,  for  example,  the  probability  that  the  project  will  require  not  more 
than  1700  SM  is  81.4  percent  while,  by  interpolation,  it  is  70  percent 
probable  that  the  effort  will  not  exceed  1580  SM. 

Alternately,  normal  probability  graph  paper  may  be  used.  On  this  type 
of  paper  a  normal  CDF  is  a  straight  line.  Enter  the  following  pairs  on  the 
paper:  ((D-cd),  15.9%),  (D,  50%),  and  ((D +  aD),  84.1%);  the  three  points 
will  lie  on  a  straight  line  from  which  table  5-3  may  be  formed  directly; 
see  figure  5-7.  As  a  further  alternate,  an  excellent  algorithm  for  the 
normal  CDF  is  given  in  [5]. 


36 


1000  1100  1200  1300  1400  1500  1600  1700  1800  1900 


Effort,  D' 

Figure  5-7.  Cumulative  Distribution  Function  of  Effort 
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APPENDIX  A 


This  appendix  gathers  the  various  computation  forms  which  were  used  in 
section  5,  together  with  the  various  instruction  sheets.  This  appendix 
therefore  contains  all  the  material  necessary  to  perform  a  complete 
software  effort  uncertainty  analysis,  or  to  form  spreadsheets  for  the 
computation. 
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Figure  A-l.  Instructions  for  Calculating  Mean  and  Standard  Deviation  of  DEMs 
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Figure  A-2.  Form  for  Computation  of  Mean  and  Standard  Deviation  of  DEMs 
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Figure  A-3.  (Concluded) 
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Figure  A-5.  Normal  Cumulative  Probability  Paper 
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